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Q ' Abstract 
(D . 

^\ . Sequential Monte Carlo (SMC) methods are a widely used set of computational tools 

' for inference in non-linear non-Gaussian state-space models. We propose a new SMC 

'nI" , algorithm to compute the expectation of additive functionals recursively. Essentially, it 

^ ' is an online or "forward-only" implementation of a forward filtering backward smoothing 

I— i, SMC algorithm proposed in [IB]. Compared to the standard path space SMC estima- 

' tor whose asymptotic variance increases quadratically with time even under favourable 

, mixing assumptions, the asymptotic variance of the proposed SMC estimator only in- 

. ' creases linearly with time. This forward smoothing procedure allows us to implement 

, on-line maximum likelihood parameter estimation algorithms which do not suffer from 

' the particle path degeneracy problem. 

I— Some key words: Expectation-Maximization, Forward Filtering Backward Smooth- 
^ ' ing. Recursive Maximum Likelihood, Sequential Monte Carlo, Smoothing, State-Space 

^ ! Models. 

o : 
o\ . 

m ■ 1 Introduction 
vn ; 

CN ■ 1.1 State-space models and inference aims 

. State-space models (SSM) are a very popular class of non-linear and non-Gaussian time 

I series models in statistics, econometrics and information engineering; see for example [7], 

' [E]) An SSM is comprised of a pair of discrete-time stochastic processes, {X„}^>q and 

^ ' {^}n>0' where the former is an A'-valued unobserved process and the latter is a 3^-valued 

^ I process which is observed. The hidden process {X„}^>q is a Markov process with initial 

■ - - ' density fig (x) and Markov transition density fg {x'\x), i.e. 

Xo Hg {■) and Xn\ (Xn-i = Xn-i) ^ fe {-l Xn-i) , n>l. (1.1) 
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It is assumed that the observations {^}„>o conditioned upon {X„}^^>q are statistically 
independent and have marginal density gg {y\x), i.e. 

Yn\ ({Xfc}fc>o = {xfe}fc>o) ~ 5e (-1 Xn) ■ (1.2) 

We also assume that fJ-g{x), fg{x\x') and gg{y\x) are densities with respect to (w.r.t.) 
suitable dominating measures denoted generically as dx and dy. For example, if Af C RP 
and 3^ C M'J then the dominating measures could be the Lebesgue measures. The variable 
6 in the densities of these random variables are the particular parameters of the model. 
The set of possible values for 9 is denoted 6. The model (|l.ip - (|1.2|) is also referred to as a 
Hidden Markov Model (HMM) in the literature. 

For any sequence {zn}n& ^'^^ integers j > i, let Zi-j denote the set {zi, Zi+i, Zj}. 
(When j < i this is to be understood as the empty set.) Equations (jl.ip and ()1.2p define 
the joint density of (Xo:„, yb:n), 

n n 
:n) yO:n) — Xklxk-ijYlgeiVklxk) (1.3) 

k=l k=0 

which yields the marginal likelihood, 

P9 (yO:n) = J PS (^0:n, VO-.n) dXQ-.n- (1-4) 

Let Sfc : X A' — >• M, /c G N, be a sequence of functions and Sn ■ X'^ — t- M, n G N, be the 
corresponding sequence of additive functionals constructed from Sk as follow^] 

n 

Sn{xO:n) = ^Sk{Xk-l,Xk) . (1.5) 
k=l 

There are many instances where it is necessary to be able to compute the following expec- 
tations recursively in time, 

cS^ = E9[S„(Xo:„)|yO:n]. (1.6) 

The conditioning implies the expectation should be computed w.r.t. the density of X^-n 
given lo:n = 2/0:n, i.e. Pe ( ^0:n| VQ-.n 

) oc Pe (xo :n)2/0:n) and for this reason 5^ is referred to as 

a smoothed additive functional. 

As the first example of the need to perform such computations, consider the problem of 
computing the score vector, V logpe (yo-.n)- The score is a vector in M"^ and its i^^ component 
is 

rT7i / M g log Pe (yo-.n) ,^ „^ 

[V logpe (yo:„)]i = . (1.7) 

Using Fisher's identity, the problem of computing the score becomes an instance of ()1.6p . 



I.e. 



Vlogpe (?/o:„) = [Vlog/e [Xk\Xk-\)\yQ,r\ + X]^^ [Vlogfifg {yk\ Xk)\ yom] 

k=\ k=0 

+ Eg[Vlogfie{Xo)\yo:n]. (1.^ 



^Incorporating dependency of on j/^, i.e. Sn is the sums of term of the form Sk {xk-i, Xk, Vk), is merely 
a matter of redefining in the computations to follow. 
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An alternative representation of the score as a smoothed additive functional based on in- 
finitesimal perturbation analysis is given in |12] . The score has applications to Maximum 
Likelihood (ML) parameter estimation |32j . |37| . 

The second example is ML parameter estimation using the Expectation-Maximization 
(EM) algorithm. Let yom be a batch of data and the aim is to maximise pg (yo:n) w.r.t. 6. 
Given a current estimate 6', a new estimate 9" is obtained by maximizing the function 

n n 

Q (^''^) = J2^e' [log/e iXk\ Xk^i)\ yo:n] + J^Ee' [logge {yk\ Xk)\ yo-.n] 

k=l k=0 

+ ¥.0' [log fie {Xo)\ yo:n] 

w.r.t. 6 and setting 9" to the maximising argument. A fundamental property of the EM 
algorithm is pg" (yo-.n) ^ Pe' {yo-.n)- For linear Gaussian models and finite state-space HMM, 
it is possible to perform the computations in the definition of Q{9',9). For general non- 
linear non-Gaussian state-space models of the form (|l.ip - (jl.2p . we need to rely on numerical 
approximation schemes. 



1.2 Current approaches to smoothing with SMC 

SMC methods are a class of algorithms that sequentially approximate the sequence of pos- 
terior distributions {pe (dxo-nlyo-.n)} n>o using a set of N weighted random samples called 
particles. Specifically, the SMC approximation of pg {dxQ:n\yo-.n), for n > 0, is 

N N 

Pe {dxo-.n\yo-.n) ■■= V (dxo-.n) , wj,'^ > 0, V VF» = 1, (L9) 

^ ' ^O-.n ^ ' 

i=l i=l 
(i) (i) 

where Wn is the importance weight associated to particle and 5 {») is the Dirac 

(i) 

measure with an atom at Xq.^. The particles are propagated forward in time using a 
combination of importance sampling and resampling steps and there are several variants of 
both these steps; see [13], [E] for details. SMC methods are parallelisable and flexible, the 
latter in the sense that SMC approximations of the posterior densities for a variety of SSMs 
can be constructed quite easily. SMC methods were popularized by the many successful 
applications to SSM. 



1.2.1 Path space and fixed-lag approximations 

A SMC approximation of may be constructed by replacing pg {dxo-.n\yo-.n) in Eq. (jl.6p 
with its SMC approximation in Eq. (11.90 - we call this the path space method since the 
SMC approximation oi pg {dxo:n\yo:n) , which is a probability distribution on is used. 

Fortunately there is no need to store the entire ancestry of each particle, i.e. |^o-n| ' 

which would require a growing memory. Also, this estimate can be computed recursively. 
However, the reliance on the approximation of the joint distribution pg (dxoml yo-.n) is bad. 
It is well-known in the SMC literature that the approximation of pg {dxo-.n \ yo-.n) becomes 
progressively impoverished as n increases because of the successive resampling steps [3], [13] . 
[34j . That is, the number of distinct samples representing pg {dxQ-k \ yo-.n) for any fixed k < n 
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diminishes as n increases ~ this is known as the particle path degeneracy problem. Hence, 
whatever being the number of particles, pg {dxQ.j^l yo.n) will eventually be approximated by a 
single unique particle for all (sufficiently large) n. This has severe consequences for the SMC 
estimate S^. In [13], under favourable mixing assumptions, the authors established an upper 
bound on the error which is proportional to n^/\/iV. Under similar assumptions, it was 
shown in |37] that the asymptotic variance of this estimate increases at least quadratically 
with n. To reduce the variance, alternative methods have been proposed. The technique 
proposed in relies on the fact that for a SSM with "good" forgetting properties, 

Pe ixO:k\ yO:n) ~ PO {xO:k\ yO:min(fc+A,n)) (1-10) 

when the horizon A is large enough; that is observations collected after times k + A bring lit- 
tle additional information about X^-^k- (See [El Corollary 2.9] for exponential error bounds.) 
This suggests that a very simple scheme to curb particle degeneracy is to stop updating the 
SMC estimate beyond time k + A. This algorithm is trivial to implement but the main 
practical problem is that of determining an appropriate value for A such that the two den- 
sities in Eq. p.lOj) are close enough and particle degeneracy is low. These are conflicting 
requirements. A too small value for the horizon will result in pg [xo-k\ yo:min(A;+A.n)) being a 
poor approximation of pg (a:;o:fc| Uo-.n) but the particle degeneracy will be low. On the other 
hand, a larger horizon improves the approximation in Eq. (jl.lOp but particle degeneracy 
will creep in. Automating the selection of A is difficult. Additionally, for any finite A the 
SMC estimate of will suffer from a non vanishing bias even as N ^ oo. In [33], for an 
optimized value of A which is dependent on n and the typically unknown mixing properties 
of the model, the SMC estimates of based on the approximation in Eq. (jl.lOp were shown 
to have an error and bias upper bounded by quantities proportional to n\ogn/y/N and 
n\ogn/N under regularity assumptions. 

The computational cost of the SMC approximation of computed using either the path 
space method or the truncated horizon method of [29] is O (N). 

1.2.2 Approximating the smoothing equations 

A standard alternative to computing is to use SMC approximations of fixed-interval 
smoothing techniques such as the Forward Filtering Backward Smoothing (FFBS) algo- 
rithm [18], [26]. Theoretical results on the SMC approximations of the FFBS algorithm 
have been recently established in [T7]; this includes a central limit theorem and exponen- 
tial deviation inequalities. In particular, under appropriate mixing assumptions, the authors 
have obtained time-uniform deviation inequalities for the SMC-FFBS approximations of the 
marginals {pe idxk\yo:n)}o<k<n 113 Section 5]; see [TJ] for alternative proofs and comple- 
mentary results. Let denote the SMC-FFBS estimate of 5^. In this work it is established 
that the asymptotic variance of \/iV — only grows linearly with time n; a fact which 

was also alluded to in [15]. The main advantage of the SMC implementation of the FFBS 
algorithm is that it does not have any tuning parameter other than the number of parti- 
cles N. However, the improved theoretical properties comes at a computational price; this 
algorithm has a computational complexity of O {N"^) compared to O (N) for the methods 
previously discussed. (It is possible to use fast computational methods to reduce the com- 
putational cost to C'(A^logA^) [30].) Another restriction is that the SMC implementation 
of the FFBS algorithm does not yield an online algorithm. 
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1.3 Contributions and organization of the article 

The contributions of this article are as follows. 

• We propose an original online implementation of the SMC-FFBS estimate of 5^. A 
particular case of this new algorithm was presented in [36], [37] to compute the score 
vector ()1.7p . However, because it was catered to estimating the score, the authors 
failed to realise its full generality. 

• An upper bound for the non- asymptotic mean square error of the SMC-FFBS estimate 
5^ of is derived under regularity assumptions. It follows from this bound that the 

asymptotic variance of \^ (^S^ — S^^ is bounded by a quantity proportional to n. 

This complements results recently obtained in [15], |17j . 

• We demonstrate how the online implementation of the SMC-FFBS estimate of can 
be applied to the problem of recursively estimating the parameters of a SSM from data. 
We present original SMC implementations of Recursive Maximum Likelihood (RML) 
[32], [36], [39] and of the online EM algorithm [25], [22], [23], [33], [8], [28l Section 
3.2.]. These SMC implementations do not suffer from the particle path degeneracy 
problem. 

The remainder of this paper is organized as follows. In Section [2] the standard FFBS 
recursion and its SMC implementation is presented. It is then shown how this recursion 
and its SMC implementation can be implemented exactly with only a forward pass. A 
non-asymptotic variance bound is presented in Section [3l Recursive parameter estimation 
procedures are presented in Section H] and numerical results are given in Section [5l We 
conclude in Section [6] and the proof of the main theoretical result is given in the Appendix. 



2 Forward smoothing and SMC approximations 

We first review the standard FFBS recursion and its SMC approximation jl8j . [26] • This 
is then followed by a derivation of a forward-only version of the FFBS recursion and its 
corresponding SMC implementation. The algorithms presented in this section do not depend 
on any specific SMC implementation to approximate {pg {dxn\yo:n)}n>o- 

2.1 The forward filtering backward smoothing recursion 

Recall the definition of in Eq. (|1.6p . The standard FFBS procedure to compute 
proceeds in two steps. In the first step, which is the forward pass, the filtering densities 
{pe {xk \ yo:k)}o<k<n are computed using Bayes' formula: 

, I s ge{yk+i\xk+i) J fe{xk+i\xk)p9{xk\yo:k)dxk 
Pe[.Xk+i\yo:k+i) - 



fge {yk+i\ x'^+J fe 4) Pe (4| yo-.k) dx^ 



fc+i 



The second step is the backward pass that computes the following marginal smoothed den- 
sities which are needed to evaluate each term in the sum that defines S^: 

P9{xk-.l,Xk\yO:n) =Pe{xk\yO:n)P9ixk-l\yO:k-l,Xk) , I < k < n. (2.1) 



5 



where 

/ I X feixk\xk-i)peixk-i\yo:k-i) „x 

pe{xk~i\yo:k~i,Xk) = 7 — . T . (2.2) 

Pe{xk\yo:k-i) 

We compute Eq. ()2.ip commencing at k = n and then, decrementing k each time, until 
k = 1. (Integrating Eq. (j2.ip w.r.t. Xk wih yield (xfc_i| yo-n) which is needed for the next 
computation.) To compute S^, n backward steps must be executed and then n expectations 
computed. This must then be repeated at time n + 1 to incorporate the effect of the new 
observation y^+i on these calculations. Clearly this is not an online procedure for computing 

{Sn}n>l- 

The SMC implementation of the FFBS recursion is straightforward [18] . In the forward 
pass, we compute and store the SMC approximation pe {dxk \ yo-.k) of pe {dxk \ yo-.k) for k = 
0,1, ... ,n. In the backward pass, we simply substitute this SMC approximation in the place 
of pg {dxk\ yo:k) in Eq. (f2TT]) . Let 



N 

Pe ( dxk I yo:n) = ^kWi^x'^'^ 
1=1 



be the SMC approximation of pg {dxk \ yo-.n), k < n, initialised at fc = n by setting W^^^ 
Wji \ By substituting pg {dxk-i\yo:k-i) for pg {xk-i\ yo-.k-i) in Eq. ^2^, we obtain 

Ell W^ljg ^4.)^ (dxk^i) 
pg{dxk^i\yo:k^i,Xk) = 77: 7 ^ . (2.4) 



This approximation is combined with pg {dxk\ yo-n) (see Eq. ()2.ip ) to obtain 

Pe{dXk^l:k\yO:n) = 2^ 2^ / (,) , ^ (/) ^ (^^^-1^^) " (^.5) 

i=i j=i ^k-iJe l^fc-i, 



Marginalising this approximation will give the approximation to pg {dxk~i\yQ-n), that is 



The SMC estimate 5^ of 5^ is then given by 

5^ = ^ / Sk{Xk-l,Xk)pg{dXk-l:k\yO:n)- (2-7) 



k=l^ 



The backward recursion for the weights, given in Eq. (j2.6p . makes this an off-line algorithm 
for computing 5^. 
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2.2 A forward only version of the forward filtering backward smoothing 
recursion 

To circumvent the need for the backward pass in the computation of S^, the following 
auxiliary function (on X) is introduced, 

Tn (Xn) ■■= J Sn (xQ-.n) Pd (^Oin-ll yO:n-l,Xn) dxo:n-l- (2.8) 

It is apparent that 

Sn = j {Xn) Pd {Xn\ VO-.n) dXn- (2.9) 

The following proposition establishes a forward recursion to compute {r^}n>0) which is 
henceforth referred to as the forward smoothing recursion. For sake of completeness, the 
proof of this proposition is given. 

Proposition 2.1 For n > 1, we have 

Tn (Xn) = j Tn-l (Xn-l) + Sn {Xn-l,Xn) Pe {Xn-l\ yO:n-l, Xn) dXn-1, (2.10) 

where Tq (xq) := 0. 

Proof. The proof is straightforward 

Tn {Xn) •= / ["Sji— 1 {xo-n—l) + Sn {Xn—ljXn )] Pe {xO:n~-l\yO:n-l,Xn) dxo;n-l 

Pe ( 



Sn-l {xO:n-l)Pe (a;0:n-2| y0:n-2, Xn-l) dXQ-.n- 
+ I Sn{Xn-l,Xn) Pe {Xn-l\ yO:n-l, Xn) dXn-l- 



The integrand in the first equality is Sn (a;o:n) while the integrand in the first integral of the 
second equality is (x„_i). ■ 

This recursion is not new and is actually a special instance of dynamic programming for 
Markov processes; see for example [5]- For a fully observed Markov process with transition 
density {/^i (x^j Xfc_i)}^>^, the dynamic programming recursion to compute the expecta- 
tion of Sn {xo:n) with respect to the law of the Markov process is usually implemented 
using a backward recursion going from time n to time 0. In the partially observed sce- 
nario considered here, {-^fc}o<fc<n conditional on 7/o:n is a "backward" Markov process with 
non-homogeneous transition densities {pe (xfc_i| yo-.k-i^ Xk)} i^j^^n- Thus ()2.10p is the corre- 
sponding dynamic programming recursion to compute 5^ with respect to pe (xoml yo-.n) for 
this backward Markov chain. This recursion is the foundation of the online EM algorithm 
and is described at length in [21] (pioneered in [ID]) where the density pQ (x„_i| yQ-,n-i,Xn) 
appearing in (xn) is usually written as 

, I s fe{Xn\Xn-l)PeiXn-l,yO:n~l) 

Pe [Xn-l\ yO:n-l,Xn) " 



/ fe {Xn\Xn~l)pe (x„_i,yo:n-l) dXn~ 
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or as in Eq. (j2.2p in [S], [5], [T5]. The forward smoothing recursion has been rediscovered 
independently several times; see [27], [33] for example. 

A simple SMC scheme to approximate can be devised by exploiting equations (j2.9p 
and ()2.10p . This is summarised as Algorithm SMC-FS below. 



Algorithm SMC-FS: Forward-only SMC computation of the FFBS estimate 
• Assume at time n— 1 that SMC approximations j , -^^l^li 1 of {dxn-i\ ?/o n-i) 

L J l<i<Af 

i of pe {dxn\ yo:n) and set 



• At time n, compute the SMC approximation \ Wn\x^^'^ 













+ Sn 











TV 



1 < i < iV, 
(2.11) 
(2.12) 



This algorithm is initialized by setting Tq {^X^^^ = for 1 < i < A^. It has a compu- 
tational complexity of O (A^) which can be reduced by using fast computational methods 



30^. 



The rationale for this algorithm is as follows. By using pg (dx„-i| UQ-n-i^Xn) defined in 
Eq. (12. 4p in place oipe ((ix„_i| yo:n-i> 3;„,) in Eq. (I2.10p . we obtain an approximation {xn) 
of (xn) which is computed at the particle locations | I . The approximation of 

5^ in Eq. (|2.12p now follows from Eq. (j2.9p by using pg (dx„| yo:n.) in place of po {dxn \ yo-.n)- 
It is valid to use the same notation for the estimates in Eq. ()2.7|) and in Eq. ()2.12p 
as they are indeed the same. The verification of this assertion may be accomplished by 
unfolding the recursion in Eq. (j2.1ip . 



3 Theoretical results 

In this section, we present a bound on the non-asymptotic mean square error of the estimate 
of S^. For sake of simplicity, the result is established for additive functionals of the type 

n 

Sn{xO:n) = ^Sk{Xk) (3.1) 
fe=0 

where Sfc : A" — )• M, and when Algorithm SMC-FS is implemented using the bootstrap 
particle filter; see [7], [19j for a definition of this "vanilla" particle filter. The result can be 
generalised to accommodate an auxiliary implementation of the particle filter [6], [17], |35j . 
Likewise, the conclusion is also valid for additive functionals of the type in (|1.5p : the proof 
uses similar arguments but is more complicated. 



8 



The following regularity condition will be assumed. 

(A) There exist constants < p, 5 < oo such that for all x,x' £ JY, y £ y and 9 £ Q, 



< fe [x'\x) < p, < ge {y\x) < 6. 

Admittedly, this assumption is restrictive and typically holds when X and 3^ are finite or 
are compact spaces. In general, quantifying the errors of SMC approximations under weaker 
assumptions is possible [17]. (More precise but complicated error bounds for the particle 
estimate of are also presented in jl5j under weaker assumptions.) However, when (A) 
holds, the bounds can be greatly simplified to the extent that they can usually be expressed 
as linear or quadratic functions of the time horizon n. These simple rates of growth are 
meaningful as they have also been observed in numerical studies even in scenarios where 
Assumption A is not satisfied (37]. 

For a function s : — )• M, let ||s|| = sup^^;^ The oscillation of s, denoted osc(s), is 

defined to be sup {|s(x) — s{y)\ ; x,y £ X}. The main result in this section is the following 
non-asymptotic bound for the mean square error of the estimate of given in Eq. 

(EUD. 

Theorem 3.1 Assume (A). Consider the additive functional Sn in (3. ij) with \\sk\\ < oo 
and osc{sk) < 1 for < k < n. Then, for any n > and 9 £ Q, 



E 



2\ n+1 ^ + 



where a is a finite constant that is independent of time n, 9 and the particular choice of 
functions {sfc}o<fc<n- 

The proof is given in the Appendix. It follows that the asymptotic variance of 
\fN — , i.e. as the number of particles N goes to infinity, is upper bounded by 

a quantity proportional to (n + 1) as the bias of the estimate is 0{1/N) jl5|. Corollary 5.3]. 

Let TZn denote the SMC estimate of obtained with the standard path space method. 
This estimate can have a much larger asymptotic variance as is illustrated with the following 
very simple model. Let fe {x'\x) = pe {x'), i.e. {Xn}n>o is an i.i.d. sequence, and let yk = V 
and Sk = s for all A: > 1 where s is some real valued function on X, and sq = 0. It can be 

easily established that the formula for the asymptotic variance of VlV (tZ^ — S^^ given in 
[TT] . [a eqn. (9.13), page 304 ] simplifies to 



n [ M^^IlMLd, + [ !lll^dx I se {xf .e (x| y) dx (3.3) 

where 

/^e {x)9e {y\ x) 



TTg {x\ y) 



f He {x') ge {y\ x') dx' ' 
(x) = s (x) — / s {x) TTg {x\ y) dx. 
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Thus the asymptotic variance increases quadratically with time n. Note though that the 
asymptotic variance of \/iV (n~^TZ^ — n~^S^] converges as n tends to infinity to a positive 



constant. Thus path space method can provide stable estimates of [n ^Sn (^o:n)| Uo-.n] , 
i.e. when the additive functionals are time-averaged. Let 



n-l 



k=l i=k+l 

where {7n}„>i is a positive non-increasing sequence that satisfies the fohowing constraints: 
Zln^n = oo and Xln^n < o^- When 7^ = then S^^n{xo:n) = Sn {xo.n) ■ One 
important choice for recursive parameter estimation (see Section H]) is 

7n = n"", 0.5 < a < 1. (3.4) 

It is also of interest to quantify the stabihty of the path space method when apphed to 
estimate = Eg [S^^n {Xo;n) \ Uo-.n] in this more general time-averaging setting. Once again 
let iZ^ denote the SMC estimate of Eg [S^^n {Xq-u)] Uo-.n] obtained with the standard path 
space method. Using the formula for the asymptotic variance of y/N (iZf^ — given in 
[llj . \14\ eqn. (9.13), page 304 ] it can be verified that this asymptotic variance is 

l2 



f [TTeix\u)sg{x)] ^^^^^2 TT ^.2 

\ ) j^^^ i=k+i 

— — ——dx / (2;)V0(x|y) da; VV] 7,^(1 -7i+i)^---(l -7n)' 



It follows from Lemma IA.4I in Appendix that any accumulation point of this sequence (in 
n) has to be positive. In contrast, the asymptotic variance of \fN{Sf^ — 5^), i.e. when 
Eg [5*^,71 (-'^Oin)! UO:n\ IS Computed using Algorithm SMC-FS, will converge to zero as n tends 
to infinity. 



4 Application to SMC parameter estimation 

An important application of the forward smoothing recursion is to parameter estimation for 
non-linear non-Gaussian SSMs. We will assume that observations are generated from an 
unknown 'true' model with parameter value 9* G G, i.e. Xn\{Xn-i = Xn-i) ~ fe-*{-\xn-i) 
and 1^,|(X„ = x„) ~ ge*{-\xn)- The static parameter estimation problem has generated a 
lot of interest over the past decade and many SMC techniques have been proposed to solve 
it; see [28] for a recent review. 



4.1 Brief literature review 

In a Bayesian approach to the problem, a prior distribution is assigned to and the sequence 
of posterior densities {p{9-,XQ-n\uo-.n)}n>o is estimated recursively using SMC algorithms 
combined with Markov chain Monte Carlo (MCMC) steps [I], [21], [38]. Unfortunately 
these methods suffer from the particle path degeneracy problem and will result in unreliable 
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estimates of the model parameters; see [3], [M] for a discussion of this issue. Given a fixed 
observation record yo-.m an alternative offline MCMC approach to estimate p {0 , xo^nluo-.n) 
has been recently proposed which relies on proposals built using the SMC approximation of 

Pe{xo:n\yO:n) E]- 

In a ML approach, the estimate of 9* is the maximising argument of the likelihood of 
the observed data. The ML estimate can be calculated using a gradient ascent algorithm 
either offline for a fixed batch of data or online [32]; see Section 14.21 Likewise, the EM 
algorithm can also be implemented offline or online. The online EM algorithm, assuming 
all calculations can be performed exactly, is presented in [25], [22], [23], [33] and [9]. For 
a general SSM for which the quantities required by the online EM cannot be calculated 
exactly, an SMC implementation is possible [8], |28l Section 3.2.]; see Section [4.31 

4.2 Gradient ascent algorithms 

To maximise the likelihood peivo-.n) w.r.t. 9, we can use a simple gradient algorithm. Let 
be the sequence of parameter estimates of the gradient algorithm. We update the 
parameter at iteration i + 1 using 

9i+i = 6'i + 7i+i Vlogpe {yG:n)\B=e, 

where Vlogpe {yo:n)\e=Q. is the score vector computed at 9 = 9i and {7t}i>i is a sequence of 
positive non-increasing step-sizes defined in (j3.4p . For a general SSM, we need to approxi- 
mate Vlogpe {yo:n)\g=g - As mentioned in the introduction, the score vector admits several 
smoothed additive functional representations; see Eq. ()1.7p and [12]. Using Eq. (jl.7p . it is 
possible to approximate the score with Algorithm SMC-FS. 

In the online implementation, the parameter estimate at time n + 1 is updated according 

to [1], m 

9n+l = 9n + log pgg.^{yn\yO:n-l) (4.1) 

Upon receiving y„, 0„ is updated in the direction of ascent of the predictive density of this 
new observation. A necessary requirement for an online implementation is that the previous 
values of the model parameter estimates (other than 9n) are also used in the evaluation of 
Vg log pg{y 

n\yo:n—i) at — 9n- This is indicated in the notation V log p^p.^ (y^j 

|?/0:n-l)- (Not 

doing so would require browsing through the entire history of observations.) This approach 
was suggested by [32] for the finite state-space case and is named RML. The asymptotic 
properties of this algorithm (i.e. the behavior of 0n in the limit as n goes to infinity) have 
been studied in the case of an i.i.d. hidden process by |39] and for an HMM with a finite 
state-space by [32]. Under suitable regularity assumptions, convergence to 9* and a central 
limit theorem for the estimation error has been established. 

For a general SSM, we can compute a SMC estimate of V logpg^^,^{yn\yo:n-i) using 
Algorithm SMC-FS upon noting that Vlogpe)„.^(y„|?/o:n-i) is equal to 

VlogP0o^„(?/O:n) - Vlogp0o^„_,(yo:n-l)• 
In particular, at time n, a particle approximation Iw^^^Xj^^H of {dXn\yo-n) is 

I J i<i<N 

computed using th.G particle approximation at time n — 1 and. parameter value — Ofi. 
Similarly, the computation of Eq. (12. lip is performed using — and with. 

) = Vlog/e( 

)\e=e„ + Vlog^e ( 

yn I ^^n) 1 0=0 • 
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The estimate of V logp^g.^ (y„|yo:n-i) is now the difference of the estimate in Eq. (|2.12p 
with the same estimate computed at time n — 1. 

Under the regularity assumptions given in Section [3l it follows from the results in 
the Appendix that the asymptotic variance (i.e. as — t- oo) of the SMC estimate of 
V logpog.^ {yn\yo:n-i) Computed using Algorithm SMC-FS is uniformly (in time) bounded. 
On the contrary, the standard path-based SMC estimate of V logp^,,.^ (y„|yo:r4-i) has an 
asymptotic variance that increases linearly with n. 



4.3 EM algorithms 

Gradient ascent algorithms are more generally applicable than the EM algorithm. However, 
their main drawback in practice is that it is difficult to properly scale the components of 
the computed gradient vector. For this reason the EM algorithm is usually favoured by 
practitioners whenever it is applicable. 

Let be the sequence of parameter estimates of the EM algorithm. In the offline 

approach, at iteration i + 1, the function 

Q{0i,6) = j \-OgPe{xQ:n,yQ:n) PeAxO:n\yO:n)dxo;ri 

is computed and then maximized. The maximizing argument is the new estimate ^j+i. 
If pe{xQ:myo:n) bclougs to the exponential family, then the maximization step is usually 
straightforward. We now give an example of this. 

Let : X X 3; ^ M, / = 1, . . . , ?n, be a collection of functions with corresponding 
additive functionals 

n 

Sl,nixO:n,yO:n) = {xk-l,Xk, yk) , I < I < m, 

k=l 

and let 



■Sl,n = J Sl^n{xO:n,yO:n)P9{xO:n\yO:n)dxo., 



The collection {'5^^„}i</<-m is also referred to as the summary statistics in the literature. 
Typically, the maximising argument of Q{6i,9) can be characterised explicitly through a 
suitable function A : M™ — )• 0, i.e. 

A (n~'S?:) (4.2) 



where = Sf^. As an example of this, consider the following stochastic volatility model 
[35]. 

Example 4.1 The stochastic volatility model is a SSM defined by the following equations: 

Yn = Pe^v{Xj2)Wn, 

where {Vn}neM o,nd {Wn}n>o ore independent and identically distributed standard normal 
noise sequences, which are also independent of each other and of the initial state Xq. The 
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model parameters 6 = ((/), cr^, /3^) G M x (0, oo) x (0, oo) are to be estimated. To apply the 
EM algorithm to this model, let 



S {Xn—lj Vn) — ip^n) j ^ (^^n— 15 ^rt) Un) — Vn ( X^i) ■ 

For large n, we can safely ignore the terms associated to the initial density fig (x) and the 
solution to the maximisation step is characterised by the function 



A{zi,Z2,Z3,Zi) = 




The SMC implementation of the forward smoothing recursion has advantages even for 
the batch EM algorithm. As there is no backward pass, there is no need to store the particle 
approximations of {pe{dxk\yo:k)} k=o n' which can result in a significant memory saving for 
large data sets. 

In the online implementation, running averages of the sufficient statistics are computed 
instead [8], [22], [23], [25], [28l Section 3.2.], [33]. Let {^fc}o<fe<n be the sequence of parameter 
estimates of the online EM algorithm computed sequentially based on yo:n- When yn+i is 
received, for each / = 1, . . . , m, compute 

5i,n+l = 7n+l / {Xn, Xn+l,yn+l) Peo:„{Xn, Xn+l\yO:n+l)dXn:n+l 

+ (l-7n+l) /ELl f n (l-7i) ) Iks'- {Xk-l,Xk,yk) Pefyui^0:n\y0:n+l)dX0:n, 
\i=k+l J 

(4.3) 

and then set 

On+l = A (Sn+l) 

where [5n+i]« = <Si,n+i- Here {7n}n>i is a step-size sequence satisfying the same conditions 
stipulated for the RML in Section 14.21 (The recursive implementation of Si^n+i is standard 
[4].) The subscript 6'o:n on peo.,„.i{xo 

■.n\yo:n) indicates that the posterior density is being 
computed sequentially using the parameter 9k~i at time k (and Oq at time 0.) References 
[9], [22], |25| chapter 4] and j33j have proposed an online EM algorithm, implemented as 
above, for finite state HMMs. In the finite state setting all computations involved can be 
done exactly in contrast to general SSMs where numerical procedures are called for. It is 
also possible to do all the calculations exactly for linear Gaussian models [23]. 

Define the vector valued function s : ?(! x X x y ^ M*" as follows: s = [s^, . . . , s™]'^. 
Computing Sn sequentially using SMC-FS is straightforward and detailed in the following 
algorithm. 



SMC-FS implementation of online EM 

At time ra = 

• Choose ^0- 

• Set T^'^ = G M™, f = l,...,iV. 

• Construct the SMC approximation {XQ\WQ^}i<i<N of p6»o(da^o|?/o)■ 
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At times n > 1 

• Construct the SMC approximation {xii\Wn^}i<i<N of p0^,.^^^{dxn\yo:n)- 

• For each i = 1, . . . , N, compute 









"(1 - 7n) + 7n S ?/n) 









• Compute Sn = ^jLi Wn^Tn^ and update the parameter, 6n = A ySnj ■ 

It was suggested in [281 Section 3.2.] that the two other SMC methods discussed in Sec- 
tion [L2] could be used to approximate Sn', the path space approach to implement the onhne 
EM was also independently proposed in [8]. Doing so would yield a cheaper alternative to 
Algorithm SMC-EM above with computational cost O (iV), but not without its drawbacks. 
The fixed-lag approximation of [29j would introduce a bias which might be difficult to con- 
trol and the path space approach suffers from the usual particle path degeneracy problem. 
Consider the step-size sequence in (|3.4|) . If the path space method is used to estimate 
Sn then the theory in Section [3] tells us that, even under strong mixing assumptions, the 
asymptotic variance of the estimate of 5„ will not converge to zero for 0.5 < a < 1. Thus it 
will not yield a theoretically convergent algorithm. Numerical experiments in [8] appear to 
provide stable results which we attribute to the fact that this variance might be very small 
in the scenarios considerecH. In contrast, the asymptotic variance of the O (-/V^) estimate 
converges to zero in time n for the entire range 0.5 < a < 1 under the same mixing con- 
ditions. The original O (-/V^) implementation proposed here has been recently successfully 
adopted in [31] to solve a complex parameter estimation problem arising in robotics. 



5 Simulations 



5.1 Comparing SMC-FS with the path space method 

We commence with a study of a scalar linear Gaussian SSM for which we may calculate 
smoothed functionals analytically. We use these exact values as benchmarks for the SMC 
approximations. The model is 

Xo^M (0, dg) , Xn+l = 4>Xn + CJvVn+l, (5.1) 
Yn = cXn + awWn, (5.2) 

where Vn A^(0, 1) and Wn ' AA(0, 1). We compared the exact values of the following 
smoothed functionals 



k=l 



1-1 



■'2,n 



k=l 



yO:n 



3,n 



ik=i 



yO:n 
(5.3) 



^In a Bayesian framework where 9 is assigned a prior distribution and we estimate {p ( a^ji, ^| yo:n)}„>o 
[Hi [211 1 EHl, the path degeneracy problem has much more severe consequences than in the ML framework 
considered here as illustrated in Indeed in the ML framework, the filter {pe (a^n| yo:n)}„>Q will have, 
under regularity assumptions, exponential forgetting properties for any 9 £ & whereas this will never be the 
case for p{x„, 9\ yo:„) . 
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computed at 0* = (0*, a^, c*, u^) = (0.8,0.1, 1.0, 1.0) with the bootstrap filter implementa- 
tion of Algorithm SMC-FS and the path space method. Comparisons were made after 2500, 
5000, 7500 and 10,000 observations to monitor the increase in variance and the experiment 
was replicated 50 times to generate the box-plots in Figure [TJ (AU rephcations used the 
same data record.) Both estimates were computed using = 500 particles. 



0(N) method: 
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0(n2) method: xS^^ 
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O(N^) method 
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0(N'') method: 




2500 5000 7500 10000 



Figure 1: Box plots of SMC estimates of the smoothed additive functions in (j5.3p for a 
linear Gaussian SSM. Estimates were computed with path space method (left column) and 
Algorithm SMC-FS (right column). The long horizontal line intersecting the box indicates 
the true value. 

From Figure [1] it is evident that the SMC estimates of Algorithm SMC-FS significantly 
outperforms the corresponding SMC estimates of the path space method. However one 
should bear in mind that the former algorithm has 0{N'^) computational complexity while 
the latter is 0{N). Thus a comparison that takes this difference into consideration is im- 
portant. From Theorem 13.11 and the discussion after it, we expect the variance of Algorithm 
SMC-FS 's estimate to grow only linearly with the time index compared to a quadratic in 
time growth of variance for the path space method. Hence, for the same computational 
effort we argue that, for large observation records, the estimate of Algorithm SMC-FS is 
always going to outperform the path space estimates. Specifically, for a large enough n, the 
variance of Algorithm SMC-FS's estimate with N particles will be significantly less than 
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the variance of the path space estimate with A'^^ particles. If the number of observations 
is smah then, taking into account the computational complexity, it might be better to use 
the path space estimate as the variance benefit of using Algorithm SMC-FS may not be 
appreciable to justify the increased computational load. 



5.2 Online EM 

Figure [2] shows the parameter estimates obtained using the SMC implementation of online 
EM for the stochastic volatility model discussed in Example 14.11 The true value of the 
parameters were 9* = [cl),a^,/3^) = (0.8,0.1, 1) and 500 particles were used. SMC-EM was 
started at the initial guess = (0.1,1,2). For the first 100 observations, only the E-step 

was executed. That is the step ^„ = A (^'Sn^ , which is the M-step was skipped. SMC-EM 
was run in its entirety for observations 101 and onwards. The step size used was 7„ = 0.01 
for n < 10^ and l/{n — 5 x lO^)"'® for n > 10^. Figure [2] shows the sequence of parameter 
estimates computed with a very long observation sequence. 



1.5r 



P*=1 
)*= 0.8 



1.02 



0.097 

1 500 2000 2500 



(xlO-^) 



Figure 2: Estimating the parameters of the stochastic volatility model with the SMC 
version of online EM, Algorithm SMC-EM. Initial parameter guess = (0.1,1,2). True 
and converged values (average of the last 1000 iterations) are indicated on the left and the 
right of the plot respectively. 



6 Discussion 



We proposed a new SMC algorithm to compute the expectation of additive functionals 
recursively in time. Essentially, it is an online implementation of the FFBS SMC algorithm 
proposed in [18]. This algorithm has an 0{N'^) computational complexity where is the 
number of particles. It was mentioned how a standard path space SMC estimator to compute 
the same expectations recursively in time could be developed. This would have an 0{N) 
computational complexity. However, as conjectured in [37], it was shown here that the 
asymptotic variance of the SMC-FFBS estimator increased linearly with time whereas that 
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of the 0{N) method increased quadratically. The onhne SMC-FFBS estimator was then 
used to perform recm'sive parameter estimation. While the convergence of RML and onhne 
EM have been established when they can be implemented exactly, the convergence of the 
SMC implementation of these algorithms have yet to be established and is currently under 
investigation. 
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A Appendix 

The proofs in this section hold for any fixed 9 and therefore 9 is omitted from the notation. 
This section commences with some essential definitions. 

Consider the measurable space {E,£). Let M{E) denote the set of all finite signed 
measures and V{E) the set of all probability measures on E. Let B{E) denote the Banach 
space of all bounded and measurable functions / equipped with the uniform norm ||/||. Let 
= / i'{dx) f{x), i.e. ;/(/) is the Lebesgue integral of the function / G B{E) w.r.t. 
the measure S M{E). If z/ is a density w.r.t. some dominating measure dx on E then, 
z^(/) = J i'{x) f{x)dx. We recall that a bounded integral kernel M(x, dx') from a measurable 
space {E, 8) into an auxiliary measurable space £') is an operator / i— t- M[f) from B{E') 
into 13 (E) such that the functions 



are iS-measurable and bounded, for any / G B{E'). In the above displayed formulae, dx' 
stands for an infinitesimal neighborhood of a point x' in E' . Let /3(M) denote the Dobrushin 
coefficient of M which defined by the following formula 



where Osci(ii^') stands the set of (^'-measurable functions / with oscillation less than or equal 
to 1. The kernel M also generates a dual operator v i— t- vM from M{E) into Ai{E') defined 
by {iy'M){f) := v{M{f)). A Markov kernel is a positive and bounded integral operator 
M with M(l) = 1. Given a pair of bounded integral operators (Mi,M2), we let (M1M2) 
the composition operator defined by (MiM2)(/) = Mi(M2(/)). For time homogenous 
state spaces, we denote by (= M™~^M = MM™~^) the m-th composition of a given 
bounded integral operator M, with m > 1. 

Given a positive function G on E, let ■ ^ G ^(-E^) ^ ^g(^^) £ ^(-E") be the Bayes 
transformation defined by 




/3(M) := sup {osc(M(/)) ; / G Osci(i^')} 



^G{v){dx) : 



1 



G(x) v{dx) 



v{G) 
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The definitions above also apply if is a density and M is a transition density. In this case 
all instances of ^{dx) should be replaced with v{x)dx and M{x,dx') by M{x,x')dx' where 
dx and dx' are the dominating measures. 

The proofs below will apply to any fixed sequence of observation {yn}n>o and it is 
convenient to introduce the following transition kernels, 

QniXn-l,dXn) = g{yn~l\Xn~l) f iXn\Xn-l)dXn, U > I, 

and Qk,n = Qk+iQk+2 ■ ■ ■ Qn, < k < n, 

with the convention that Qn,n = Id, the identity operator. Note that Qk,ni^) = p{yk:n-i\xk)- 
Let the mapping ■ V{X) — )• V{X), /c > 1, be defined as follows 

Several probability densities and their SMC approximations are introduced to simplify the 
exposition. The predicted filter is denoted by 

r]n{dXn) = p{dXn |2/0:n-l ) 

with the understanding that riQ{dxQ) is the initial distribution of Xq. Let r]^ denote its SMC 
approximation with N particles. (This notation for the SMC approximation is opted for, 
instead of the usual rjn, to make the number of particles explicit.) The bounded integral 
operator D^^n from X into X^~^^ is defined as 

Dk,n{Sn){Xk) ■■= / p{dX0;k~l\xk,y0:k~l) Y\ g{yq\Xg) f {Xq+i\Xg) Sn{xQ:n)dXk+l:n (A.l) 

Dk^n is defined for any pair of time indices k, n satisfying < A; < n with the convention 
that p{xQ-k-i \xk-,yo:k-i) = 1 for /c = and = 1- ^^^^ SMC approximation, D^^, is 

Dk,n{Sn){Xk) '■= I P^idX0:k-l\xk,y0:k-l) Yl diVqlXq) f iXq+l\Xq) Sn{xO:n)dXk+l:n 

\q=k J 

(A.2) 

where {dxQ-k-i \xk,yo:k-i) is the SMC approximation of p{dxo:k-i \xk,yo:k-i) obtained 
from the SMC-FFBS approximation of Section 12. H i.e. 

k 

p^{dXQ;k-l \xk,yQ:k~l) = ^M^.^N_^{Xq,dXq-l) (A. 3) 

q=l 

where the backward Markov transition kernels M„ „n are defined through 

, , , 'niq-\{dXq-x)9{^yq-\\Xq-\)f{Xq\Xq-x) .. ^, 



M, 



It is easily established that the SMC-FFBS approximation oi p{dxk\yQ:n) ^ k < n, is precisely 
the marginal of 

P (dxo 

:n— 1 \Xnj yO:n—l )p{dXn\yQ:n) 
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where p{dxn\yo:n) was defined in (|1.9p . Finally, we define 



p "-i" nnrl P 

^'^■'^ - i?,,„(i) ^^"^ ^'='"-z),yi)- 

The fohowing estimates are a straightforward consequence of Assumption (A). For time 
indices < k < q < n, 

, _ QM(l)(^fc) ^ ,2r2 ^ f Qk,g{Xk,dXq)Qq,n{l){Xg) \ , -A\{l-k) .^ 

Ofc,n — sup — ^ p , pi — — — -- I ^ ^r — P j , V^-OJ 

Qfc,n(l)K) V Qfc,g(<3g,n(l)) / 

and for < k < q, 

Mk,r,{x, dz) < Mfc,,(x', dz) =^ P {Mg,,N_^ . . . M^^^M_^ < (1 - p-4)«-'=+i . (A.6) 

Several auxiliary results are now presented. For any (p G B{X), let 

Vi'{^) = r^^{^)-Mriti){^)- (A.7) 
The following is an almost sure Kintchine type inequality \14:\ Lemma 7.3.3]. 

Lemma A.l LetJ-^ := a ^|x|*^;0 < A; < n, 1 < f < A^, |^ , n > 0, he the natural filtration 

associated with the N -particle approximation model and he the trivial sigma field. For 
any r > 1, there exist a finite (non random) constant a,, such that the following inequality 
holds for all k > and J-j^_i measurable functions ip^ G B{'^) ^-t- osc{ip^) < 1, 



E 



1 



This inequality may be used to derive the following L,. error estimate |14^ Theorem 
7.4.4]. 

Lemma A. 2 For any r > 1, there exists a constant a,, such that the following inequality 
holds for all k > and (p G B{X) s.t. osc[ip) < 1, 

ViVE(|[ry^-r?„]Mf )'<a. g /3 . (A.8) 

A time-uniform bound for (IA.8|) may be obtained by using the estimates in ()A.5P -( lA^ . 
The final auxiliary result is the following. 

Lemma A. 3 For time indices 1 < k < n, 

ritiDtiASn) = {ritiQk){DUSn)) 
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Proof: 

= I r]k^lidXk^i)p^{dXQ.k-2\Xk-l,yO:k-2) n QqiXq^l,dXq) Sn{xO:n) 

= J r]j;_iidxk_i)Qk{xk-i,dxk) 

Xp^{dxo.k-2\Xk-l,yO:k-2) i H Qq{Xq-l,dXg)] S„{xO;n) 

\q=k+l ) 

The result follows upon noting that 

Vk-i{dxk^i)Qkixk^i,dxk) =Vk-iQk{dxk) Mf^^^N_^{xk,dxk-i). 

To prove Theorem l3.H the same semigroup techniques of [14^ Section 7.4.3] are employed. 
Proof: 

The following decomposition is central 



0<A:<n 



with the convention that r]_^D_i^ = r]o{dxo) Y[ Qq{xq-i,dxq), for fc = 0. Lemma [A. 3 1 

q=l 

states that 

and therefore the decomposition can be also written as 

with the convention $o(^^i) = "HOi for k = 0. Let 

Then every term in the r.h.s. of ()A.9P takes the following form 

Vk^k,ni^k,n) _ VkQk,n{^) ^^N fj^^ fcN 



— N 

where the integral operators -D^ „ are defined as follows, 



^ X ^z),y j (A.io) 



^k'ni^n) 



VkQk,n{^) ' 

Finally, using ()A.9P and (jA.lOp , 5„ — 5^ is expressed as 
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where the first order term is 



0<k<n 

and the second order remainder term is 

1 



0<k<n vilDU^) 



The non-asymptotic variance bound is based on the triangle inequahty 



(A.U) 



and bounds are derived below for the individual expressions on the right-hand side of this 
equation. 



Using the fact that |v^^ (^k,n('^kn) ) <" mean and uncorrelated, 



0<k<n 



0<k<n ^ ^ 



(A.12) 



The following results are needed to bound the right-hand side of (|A.12p . First, observe that 



D^^{1) = Qk,n{^)-, and Df^^{l) = „,(!). Now using the decomposition, 



it follows that 



KniSL) < bk,n OSc(Pil(5„,)) 



(A.13) 



For linear functionals of the form (|3.ip , it is easily checked that 



0<q<k 



(Sq) + E Qk,q(.Sq Qq,n{l)) 
k<q<n 



with the convention ^at ... M^^-^ = Id, the identity operator, for q = k. Recalling 
that D^^{1) = Qk,n{^), we conclude that 



0<g<fc 



k<q<n 



'ik,q{Qq,n{y) ^g) 
Qk,q{Qq,n{^)) 



and therefore 



Pk,n{^n) — E [■ 



{Sq)+ E 



0<q<k 



k<q<n 



Qk,q{Qq,ni^) ^q) 
QkMA^)) 
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Thus, 



< Eo<g<fc /5 {^k,r,^_, ■ ■ ■ ^g+l,<) OSc{Sg) + Ek<q<n ^ ( 



Qk,q{xk4Xq)Qq,n{l){Xq) 
Qfc,g(Q9,u(l)) 



OSc(Sg) 

(A.14) 

Using the estimates in (jA.Sp and (jA.Gp for the contraction coefficients, and the estimate in 
(jA.SP for 6fc „, it follows that there exists some finite (non random) constant c such that the 
bound ^ 

(A.15) 



< C 



holds for any pair of time indexes k, n satisfying < A; < n, particle number N and choice 
of functions {s/fc}o<fc<n- The desired bound for ()A.14p is now obtained by combining this 
result with Lemma lA.ll 



0<k<n 

< d{n + 1) 



(A.16) 



where d \s a constant whose value does not depend on {n,N,Sn)- 
Concerning the teim E {R^ {Snf} in ([XH]) . 



0<k<n 



E 



-VN{Vk- Vk) Dk,na) X ^n"" {KniSk,n) 



< 



0<fc<n 

s E 

0<fe<n 

X E 



h,nM 



N {rjk - Vk) Dk,na) X ^Vi" (d^S^J 
N{Vk-Vk)Dk,na) 



1 

2l 2 



1 

4^ 4 



1 



< 



1 



e(n + 1) 



(A.17) 



where e is a constant whose value does not depend on (n, N, Sn)- The second line follows 
from (lA.Sp and the third by the Cauchy-Schwartz inequality. The final line was arrived at 
by the same reasoning used to derive bound ()A.16P and Lemma |A.2[ The assertion of the 
theorem may be verified by substituting bounds ()A.16p and ()A.17p into (|A.lip . 
It is possible to write 

n n n k—1 

n (i-7.)^ + EE^'(i-7m)^---(i-7n)^ 

fe=l i=k+l k=2 i=l 

as the sum in (lA.lSp below. 
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Lemma A. 4 Let a G (0.5, 1] and 7^ = n " for n > 0. Then 



n-1 



liminf V(n+l-i)7f(l- 7m)' •••(!- 7n)' >0. (A.18) 

n— >oo 

i=l 

Proof: 

Let [aj denote the largest integer less than or equal to a. Since the result is obvious for 
a = 1, let a G (0.5, 1). 

n-1 

ln+ Yl {n + l-ihf{l-7i+i?---il-lnf 

i=[n/2\ 

n-1 



i=[n/2\ 
^n+l-Ln/2j 



2(n-i) 



i=i 



(1-A„)2 (1-A, 



where A„ = (1 - 7Ln/2j)' and 



j>0 



(1-A„)2- 



It may be verified that 



and 



Hence the result follows. 



lim ^ = 2-2«-2 

n->cxD (1 — An) 

lim 7^ ^ jXi-^ = 0. 

n— >-cx) ■'— ' 

i>n+l-[n/2j 
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